04 - R framework with IMPACT - session 4

Author

Yann Say

Published

May 1, 2024

library(analysistools)
library(dplyr)

my_data <- analysistools::analysistools_MSNA_template_data

sampling_frame <- data.frame(
  strata = c("admin1a", "admin1b", "admin1c"),
  population = c(100000, 200000, 300000)
)

set.seed(1323)
my_data <- my_data |>
  mutate(num_aged_school_children = round(runif(100, min = 0, max = 5)),
         num_enrolled_school_children = round(runif(100, min = 0, max = 5)),
         num_enrolled_school_children = case_when(num_aged_school_children == 0 ~ NA, num_aged_school_children < num_enrolled_school_children ~ num_aged_school_children,
                                                  TRUE ~ num_enrolled_school_children
                                                  ))

Analysis - extended analysis

The framework is built around 4 steps: cleaning, composition, analysis, outputs

  • Cleaning: Any manipulation to go from the raw data to the clean data
  • Composition: Any manipulation before the analysis e.g. adding indicators, adding information from loop, main dataset, or any other dataset (e.g. previous round), aok aggregation, etc.
  • Analysis: Any manipulation regarding only the analysis.
  • Outputs: Any manipulation to format the outputs. Outputs are created from the results table, from the stat + analysis key

The following section will present some introduction about the analysis.

The third step of the framework is the analysis. The analysis step aims to create a long table with one result per line and an analysis key. That table is not made for a human to read it but to store some information.

The analysis key format is currently :

  • analysis type @/@ analysis variable %/% analysis variable value @/@ grouping variable %/% grouping variable value

  • analysis type @/@ dependent variable %/% dependent variable value @/@ independent variable %/% independent variable value

If there are two or more grouping variables it would look like that

  • analysis type @/@ analysis variable %/% analysis variable value @/@ grouping variable 1 %/% grouping variable value 1 -/- grouping variable 2 %/% grouping variable value 2

Same would apply for analysis variable in case of a ratio.

The current analysis types are :

  • mean
  • median
  • prop_select_one: proportion for select one
  • prop_select_multiple: proportion for select multiple
  • ratio
create_*

create_* functions will create, transform something, e.g. creating a cleaning log with the checks to be filled, create analysis results table, create an output.

Outputs from create_* functions outputs can be in different shape, format, etc.

create_* function is catch-all.

create_analysis and list of analysis (loa)

List of analysis (loa) is a list with all analysis to be performed. It takes the form of a data frame with minimum 4 columns:

  • analysis_type : The analysis type that should be performed.
  • analysis_var : The analysis variable or dependent variable.
  • group_var : The grouping variable or independent variable.
  • level : The confidence level (expressed between 0 and 1).
my_loa <- analysistools::analysistools_MSNA_template_loa

my_loa
analysis_type analysis_var group_var level
prop_select_one admin1 NA 0.95
mean income_v1_salaried_work NA 0.95
median income_v1_salaried_work NA 0.95
mean expenditure_debt NA 0.95
median expenditure_debt NA 0.95
prop_select_one wash_drinkingwatersource NA 0.95
prop_select_multiple edu_learning_conditions_reasons_v1 NA 0.95
mean income_v1_salaried_work admin1 0.95
median income_v1_salaried_work admin1 0.95
mean expenditure_debt admin1 0.95
median expenditure_debt admin1 0.95
prop_select_one wash_drinkingwatersource admin1 0.95
prop_select_multiple edu_learning_conditions_reasons_v1 admin1 0.95

The loa can be passed as argument to the create_analysis function.

my_data <- my_data %>% 
  add_weights(sampling_frame, "admin1", "strata", "population")

my_design <- srvyr::as_survey_design(my_data, weights = "weights", strata = "admin1")
my_results <- create_analysis(my_design, loa = my_loa, sm_separator = "/")
head_results_table <- my_results$results_table %>% 
  head(5)

tail_results_table <- my_results$results_table %>% 
  tail(5)

rbind(head_results_table,tail_results_table)
analysis_type analysis_var analysis_var_value group_var group_var_value stat stat_low stat_upp n n_total n_w n_w_total analysis_key
prop_select_one admin1 admin1a NA NA 0.1666667 0.1666667 0.1666667 31 100 16.66667 100 prop_select_one @/@ admin1 %/% admin1a @/@ NA %/% NA
prop_select_one admin1 admin1b NA NA 0.3333333 0.3333333 0.3333333 27 100 33.33333 100 prop_select_one @/@ admin1 %/% admin1b @/@ NA %/% NA
prop_select_one admin1 admin1c NA NA 0.5000000 0.5000000 0.5000000 42 100 50.00000 100 prop_select_one @/@ admin1 %/% admin1c @/@ NA %/% NA
mean income_v1_salaried_work NA NA NA 20.0472777 19.6448025 20.4497529 100 100 100.00000 100 mean @/@ income_v1_salaried_work %/% NA @/@ NA %/% NA
median income_v1_salaried_work NA NA NA 20.0000000 20.0000000 21.0000000 100 100 100.00000 100 median @/@ income_v1_salaried_work %/% NA @/@ NA %/% NA
prop_select_multiple edu_learning_conditions_reasons_v1 unreliable_technology admin1 admin1c 0.5000000 0.3450192 0.6549808 21 42 25.00000 50 prop_select_multiple @/@ edu_learning_conditions_reasons_v1 %/% unreliable_technology @/@ admin1 %/% admin1c
prop_select_multiple edu_learning_conditions_reasons_v1 lack_equipment admin1 admin1c 0.5714286 0.4180373 0.7248198 24 42 28.57143 50 prop_select_multiple @/@ edu_learning_conditions_reasons_v1 %/% lack_equipment @/@ admin1 %/% admin1c
prop_select_multiple edu_learning_conditions_reasons_v1 other admin1 admin1c 0.4047619 0.2526185 0.5569053 17 42 20.23810 50 prop_select_multiple @/@ edu_learning_conditions_reasons_v1 %/% other @/@ admin1 %/% admin1c
prop_select_multiple edu_learning_conditions_reasons_v1 dont_know admin1 admin1c 0.5000000 0.3450192 0.6549808 21 42 25.00000 50 prop_select_multiple @/@ edu_learning_conditions_reasons_v1 %/% dont_know @/@ admin1 %/% admin1c
prop_select_multiple edu_learning_conditions_reasons_v1 prefer_not_to_answer admin1 admin1c 0.4285714 0.2751802 0.5819627 18 42 21.42857 50 prop_select_multiple @/@ edu_learning_conditions_reasons_v1 %/% prefer_not_to_answer @/@ admin1 %/% admin1c

create_analysis_ratio

To calculate ratio, there are two functions create_analysis_ratio or create_analysis with a loa with more information.

my_loa_with_ratio <- read.csv("inputs/07 - example - loa_with_ratio.csv")
my_loa_with_ratio %>% 
  filter(analysis_type == "ratio") |>
  select(analysis_type, analysis_var, group_var, analysis_var_numerator, analysis_var_denominator)
analysis_type analysis_var group_var analysis_var_numerator analysis_var_denominator
ratio NA NA num_enrolled_school_children num_aged_school_children
ratio NA admin1 num_enrolled_school_children num_aged_school_children
my_results_with_ratio <- create_analysis(my_design, loa = my_loa_with_ratio, sm_separator = "/")
my_results_with_ratio$results_table %>% 
  filter(analysis_type == "ratio")
analysis_type analysis_var analysis_var_value group_var group_var_value stat stat_low stat_upp n n_total n_w n_w_total analysis_key
ratio num_enrolled_school_children %/% num_aged_school_children NA %/% NA NA NA 0.6874563 0.6054471 0.7694654 89 89 88.12226 88.12226 ratio @/@ num_enrolled_school_children %/% NA -/- num_aged_school_children %/% NA @/@ NA %/% NA
ratio num_enrolled_school_children %/% num_aged_school_children NA %/% NA admin1 admin1a 0.6881720 0.5646192 0.8117249 29 29 15.59140 15.59140 ratio @/@ num_enrolled_school_children %/% NA -/- num_aged_school_children %/% NA @/@ admin1 %/% admin1a
ratio num_enrolled_school_children %/% num_aged_school_children NA %/% NA admin1 admin1b 0.6461538 0.4939644 0.7983432 25 25 30.86420 30.86420 ratio @/@ num_enrolled_school_children %/% NA -/- num_aged_school_children %/% NA @/@ admin1 %/% admin1b
ratio num_enrolled_school_children %/% num_aged_school_children NA %/% NA admin1 admin1c 0.7173913 0.5887553 0.8460273 35 35 41.66667 41.66667 ratio @/@ num_enrolled_school_children %/% NA -/- num_aged_school_children %/% NA @/@ admin1 %/% admin1c

Arguments of create_analysis_ratio

create_analysis_ratio has two arguments: numerator_NA_to_0 and filter_denominator_0 that by default are set to TRUE.

  • numerator_NA_to_0 will turn all NA of the numerator into 0’s, default TRUE.

  • filter_denominator_0 will remove all rows with 0’s in the denominator, default TRUE.

The following example show a dataset with the number of children (num_children), the number of children enrolled to a school (num_enrolled) and the number of children attending school on a regular basis (num_attending).

school_ex <- data.frame(
  hh = c("hh1", "hh2", "hh3", "hh4"),
  num_children = c(3, 0, 2, NA),
  num_enrolled = c(3, NA, 0, NA),
  num_attending = c(1, NA, NA, NA)
  )

me_design <- srvyr::as_survey(school_ex)

school_ex
hh num_children num_enrolled num_attending
hh1 3 3 1
hh2 0 NA NA
hh3 2 0 NA
hh4 NA NA NA
  • What is the ratio between children attending school and the number of children ?
  • How many households are included in the calculation?

Default value will give a ratio of 0.2 as there are 1 child out of 5 attending school.

numerator: 1 child from hh1 and 0 from hh3.

denominator: 3 from hh1 and 2 from hh3. In the hh3, the num_attending is NA because there is a skip logic, there cannot be a child attending as none are enrolled.

By default, the function has the argument numerator_NA_to_0 set to TRUE to turn that NA into a 0.

n and n_total are 2 as 2 households were included in the calculation. hh2 was not included in the calculation of totals. The argument filter_denominator_0 set to TRUE removes that row.

create_analysis_ratio(me_design,
  analysis_var_numerator = "num_attending",
  analysis_var_denominator = "num_children") %>%
  select(analysis_type, analysis_var, stat, n, n_total, analysis_key)
analysis_type analysis_var stat n n_total analysis_key
ratio num_attending %/% num_children 0.2 2 2 ratio @/@ num_attending %/% NA -/- num_children %/% NA @/@ NA %/% NA
  • What will be the ratio if only numerator_NA_to_0 is set to FALSE ?
  • How many households are included in the calculation?

Ratio will be 1/3, as hh3 with 2 children and NA for attending will be removed with the na.rm = T inside the survey_ratio calculation.

n and n_total is 1 as only 1 household was used.

create_analysis_ratio(me_design,
                      analysis_var_numerator = "num_attending",
                      analysis_var_denominator = "num_children",
                      numerator_NA_to_0 = FALSE) %>% 
  select(analysis_type, analysis_var, stat, n, n_total, analysis_key)
Warning: There were 2 warnings in `dplyr::summarise()`.
The first warning was:
ℹ In argument: `srvyr::survey_ratio(...)`.
Caused by warning in `qt()`:
! NaNs produced
ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
analysis_type analysis_var stat n n_total analysis_key
ratio num_attending %/% num_children 0.3333333 1 1 ratio @/@ num_attending %/% NA -/- num_children %/% NA @/@ NA %/% NA
  • What will be the ratio if only filter_denominator_0 is set to FALSE ?
  • How many households are included in the calculation?

ratio will be 0.2 as there are 1 child out of 5 attending school. The number of household counted, n and n_total, is equal to 3 instead 2. The household with 0 child is counted in the totals. (01 + 0 + 0) / (3 + 0 + 2)

create_analysis_ratio(me_design,
                      analysis_var_numerator = "num_attending",
                      analysis_var_denominator = "num_children",
                      filter_denominator_0 = FALSE)  %>% 
  select(analysis_type, analysis_var, stat, n, n_total, analysis_key)
analysis_type analysis_var stat n n_total analysis_key
ratio num_attending %/% num_children 0.2 3 3 ratio @/@ num_attending %/% NA -/- num_children %/% NA @/@ NA %/% NA

Analysis - Reviewing the analysis

review_*

review_* functions will review an object by comparing it to standards or another object and flags differences, e.g. reviewing the cleaning by comparing the raw dataset, the clean dataset and the cleaning log, analysis comparing it with another analysis.

create_loa_from_results

If the loa that was used was shared, it can be re-used. Otherwise, with the results table and the analysis key, the function create_loa_from_results will generate a loa that can be used to create the analysis for the review.

my_loa_for_review <- my_results_with_ratio$results_table %>% 
  create_loa_from_results()

my_loa_for_review
analysis_type analysis_var group_var level analysis_var_numerator analysis_var_denominator numerator_NA_to_0 filter_denominator_0
prop_select_one admin1 NA 0.95 NA NA NA NA
mean income_v1_salaried_work NA 0.95 NA NA NA NA
median income_v1_salaried_work NA 0.95 NA NA NA NA
mean expenditure_debt NA 0.95 NA NA NA NA
median expenditure_debt NA 0.95 NA NA NA NA
ratio NA NA 0.95 num_enrolled_school_children num_aged_school_children TRUE TRUE
prop_select_one wash_drinkingwatersource NA 0.95 NA NA NA NA
prop_select_multiple edu_learning_conditions_reasons_v1 NA 0.95 NA NA NA NA
mean income_v1_salaried_work admin1 0.95 NA NA NA NA
median income_v1_salaried_work admin1 0.95 NA NA NA NA
mean expenditure_debt admin1 0.95 NA NA NA NA
median expenditure_debt admin1 0.95 NA NA NA NA
ratio NA admin1 0.95 num_enrolled_school_children num_aged_school_children TRUE TRUE
prop_select_one wash_drinkingwatersource admin1 0.95 NA NA NA NA
prop_select_multiple edu_learning_conditions_reasons_v1 admin1 0.95 NA NA NA NA
Note

create_loa_from_results will not guess the arguments for numerator_NA_to_0 and filter_denominator_0, they will be set to TRUE by default.

The confidence level will also be set to .95 by default.

review_analysis

review_analysis will compare 2 results together and present the differences. It will not check how the analysis was created nor check for inconsistencies. That mean, to review an analysis, it is necessary to create one and compare them.

review_*

review_* functions will review an object by comparing it to standards or another object and flags differences, e.g. reviewing the cleaning by comparing the raw dataset, the clean dataset and the cleaning log, analysis comparing it with another analysis.

  • if the cleaning has been filled correctly
  • the cleaning has been done correctly
  • comparing indicators
  • comparing analysis
  • etc.
my_design_for_review <- srvyr::as_survey_design(my_data, weights = "weights", strata = "admin1")
analysis_for_review <- create_analysis(my_design_for_review, my_loa_for_review, sm_separator = "/")

binded_table <- my_results_with_ratio$results_table %>% 
  left_join(analysis_for_review$results_table, by = "analysis_key")
my_review <- review_analysis(binded_table)

typeof(my_review)
[1] "list"
names(my_review)
[1] "results_table" "review_table" 
my_review$review_table %>%
  head()
analysis_key stat review_check review_comment analysis_type analysis_var group_var
prop_select_one @/@ admin1 %/% admin1a @/@ NA %/% NA stat.x TRUE Same results prop_select_one admin1 NA
prop_select_one @/@ admin1 %/% admin1b @/@ NA %/% NA stat.x TRUE Same results prop_select_one admin1 NA
prop_select_one @/@ admin1 %/% admin1c @/@ NA %/% NA stat.x TRUE Same results prop_select_one admin1 NA
mean @/@ income_v1_salaried_work %/% NA @/@ NA %/% NA stat.x TRUE Same results mean income_v1_salaried_work NA
median @/@ income_v1_salaried_work %/% NA @/@ NA %/% NA stat.x TRUE Same results median income_v1_salaried_work NA
mean @/@ expenditure_debt %/% NA @/@ NA %/% NA stat.x TRUE Same results mean expenditure_debt NA
my_review$review_table %>%
  group_by(stat, review_check, review_comment) %>%
  tally()
stat review_check review_comment n
stat.x TRUE Same results 147
Note

analysis_key are equivalent of an unique identifier. All analysis key should be unique.

jittered_results_table <- binded_table
set.seed(123)
jittered_results_table[sample(1:nrow(jittered_results_table), 5), "stat.x"] <- sample(unique(jittered_results_table$stat.x), 5, T)
set.seed(124)
jittered_results_table[sample(1:nrow(jittered_results_table), 5), "stat.y"] <- sample(unique(jittered_results_table$stat.y), 5, T)
set.seed(125)
jittered_results_table[sample(1:nrow(jittered_results_table), 5), "stat.x"] <- NA
set.seed(1236)
jittered_results_table[sample(1:nrow(jittered_results_table), 5), "stat.y"] <- NA
my_jittered_review <- review_analysis(jittered_results_table, 
                                      stat_columns_to_review = "stat.x",
                                      stat_columns_to_compare_with = "stat.y")
my_jittered_review$review_table %>%
  group_by(stat, review_check, review_comment) %>%
  tally()
stat review_check review_comment n
stat.x FALSE Different results 10
stat.x FALSE Missing in stat.x 5
stat.x FALSE Missing in stat.y 5
stat.x TRUE Same results 127
my_jittered_review$results_table %>%
  filter(!review_check_stat.x) %>% 
  head(10)
analysis_type.x analysis_var.x analysis_var_value.x group_var.x group_var_value.x stat.x stat_low.x stat_upp.x n.x n_total.x n_w.x n_w_total.x analysis_key analysis_type.y analysis_var.y analysis_var_value.y group_var.y group_var_value.y stat.y stat_low.y stat_upp.y n.y n_total.y n_w.y n_w_total.y review_check_stat.x review_comment_stat.x
prop_select_one wash_drinkingwatersource borehole_tubewell NA NA NA -0.0009366 0.0839997 4 100 4.1531547 100.00000 prop_select_one @/@ wash_drinkingwatersource %/% borehole_tubewell @/@ NA %/% NA prop_select_one wash_drinkingwatersource borehole_tubewell NA NA 0.0415315 -0.0009366 0.0839997 4 100 4.1531547 100.00000 FALSE Missing in stat.x
prop_select_one wash_drinkingwatersource piped_into_compound NA NA 0.0657820 0.0199103 0.1354632 7 100 7.7686750 100.00000 prop_select_one @/@ wash_drinkingwatersource %/% piped_into_compound @/@ NA %/% NA prop_select_one wash_drinkingwatersource piped_into_compound NA NA 0.0776867 0.0199103 0.1354632 7 100 7.7686750 100.00000 FALSE Different results
prop_select_one wash_drinkingwatersource rain_water_collection NA NA 0.0291859 -0.0055007 0.0638725 3 100 2.9185868 100.00000 prop_select_one @/@ wash_drinkingwatersource %/% rain_water_collection @/@ NA %/% NA prop_select_one wash_drinkingwatersource rain_water_collection NA NA 0.4285714 -0.0055007 0.0638725 3 100 2.9185868 100.00000 FALSE Different results
prop_select_one wash_drinkingwatersource unprotected_spring NA NA NA -0.0055476 0.0656830 3 100 3.0067702 100.00000 prop_select_one @/@ wash_drinkingwatersource %/% unprotected_spring @/@ NA %/% NA prop_select_one wash_drinkingwatersource unprotected_spring NA NA 0.0300677 -0.0055476 0.0656830 3 100 3.0067702 100.00000 FALSE Missing in stat.x
prop_select_multiple edu_learning_conditions_reasons_v1 lack_qualified_staff NA NA NA 0.4723972 0.6786524 57 100 57.5524834 100.00000 prop_select_multiple @/@ edu_learning_conditions_reasons_v1 %/% lack_qualified_staff @/@ NA %/% NA prop_select_multiple edu_learning_conditions_reasons_v1 lack_qualified_staff NA NA 0.5755248 0.4723972 0.6786524 57 100 57.5524834 100.00000 FALSE Missing in stat.x
mean income_v1_salaried_work NA admin1 admin1b 0.1612903 19.0489416 20.5066140 27 27 33.3333333 33.33333 mean @/@ income_v1_salaried_work %/% NA @/@ admin1 %/% admin1b mean income_v1_salaried_work NA admin1 admin1b 19.7777778 19.0489416 20.5066140 27 27 33.3333333 33.33333 FALSE Different results
mean expenditure_debt NA admin1 admin1c 0.5229561 19.6006716 20.8755188 42 42 50.0000000 50.00000 mean @/@ expenditure_debt %/% NA @/@ admin1 %/% admin1c mean expenditure_debt NA admin1 admin1c 20.2380952 19.6006716 20.8755188 42 42 50.0000000 50.00000 FALSE Different results
prop_select_one wash_drinkingwatersource water_kiosk admin1 admin1a NA -0.0317653 0.0962814 1 31 0.5376344 16.66667 prop_select_one @/@ wash_drinkingwatersource %/% water_kiosk @/@ admin1 %/% admin1a prop_select_one wash_drinkingwatersource water_kiosk admin1 admin1a 0.0322581 -0.0317653 0.0962814 1 31 0.5376344 16.66667 FALSE Missing in stat.x
prop_select_one wash_drinkingwatersource bottled_water admin1 admin1b 0.0370370 -0.0364712 0.1105453 1 27 1.2345679 33.33333 prop_select_one @/@ wash_drinkingwatersource %/% bottled_water @/@ admin1 %/% admin1b prop_select_one wash_drinkingwatersource bottled_water admin1 admin1b 19.8709677 -0.0364712 0.1105453 1 27 1.2345679 33.33333 FALSE Different results
prop_select_one wash_drinkingwatersource other admin1 admin1c 0.0476190 -0.0183900 0.1136281 2 42 2.3809524 50.00000 prop_select_one @/@ wash_drinkingwatersource %/% other @/@ admin1 %/% admin1c prop_select_one wash_drinkingwatersource other admin1 admin1c 20.0000000 -0.0183900 0.1136281 2 42 2.3809524 50.00000 FALSE Different results

Exercises

Exercise 1

library(analysistools)
library(dplyr)

my_data <- analysistools::analysistools_MSNA_template_data

sampling_frame <- data.frame(
  strata = c("admin1a", "admin1b", "admin1c"),
  population = c(100000, 200000, 300000)
)

Create the analysis for the following indicators:

  • % of households having/had soap at home
  • % of households by type of primary source of drinking water
  • % of households by self-reported barriers to accessing health care
  • Average household income over the 30 days prior to data collection (total)
  • Median household income over the 30 days prior to data collection (total)
  • Average household expenditures in the 6 months prior to data collection (health)
  • Median household expenditures in the 6 months prior to data collection (health)
  • Ratio household expenditures on health in the 6 months prior to data collection and the household income over the 30 days prior to data collection.
  • % of households per number of days when the household had to restrict consumption by adults in order for small children to eat to cope with a lack of food or money to buy it.

The analysis should be at admin1 level (the strata).

name label::english type
income_v2_total Can you estimate your household’s total income (in local currency) over the last 30 days from all sources? Please only report income received in the form of money, not items or services. integer
expenditure_health 3. Health-related expenditures (healthcare, medicine, etc.) integer
rCSIMealAdult During the last 7 days, were there days (and, if so, how many) when your household had to restrict consumption by adults in order for small children to eat to cope with a lack of food or money to buy it? integer
wash_drinkingwatersource What is the main source of drinking water for members of your household? select_one wash_drinkingwatersource
wash_soap Do you have soap or detergent in your household for washing hands? [if not remote] Can you show it to me? select_one wash_soap
health_barriers What are your barriers to access health care? select_multiple health_barriers
exercise_data <- analysistools::analysistools_MSNA_template_data

exercise_sampling_frame <- data.frame(
  strata = c("admin1a", "admin1b", "admin1c"),
  population = c(100000, 200000, 300000)
)

template_loa <- readxl::read_excel("inputs/08 - exercise - template loa.xlsx")

rCSIMealAdult should be analysed as a categorical variable not a numerical variable to get the proportion per day.

exercise_data <- exercise_data %>%
  add_weights(
    exercise_sampling_frame,
    "admin1", "strata", "population"
  )

exercise_design <- srvyr::as_survey_design(exercise_data, weights = "weights", strata = "admin1")

exercise_loa <- readxl::read_excel("inputs/09 - correction - loa.xlsx")

exercise_results <- create_analysis(exercise_design, loa = exercise_loa, sm_separator = "/")

Exercise 2

  • Review this analysis

There is no weights. The strata are admin1

analysis_to_review <- readxl::read_excel("inputs/10 - exercise - analysis_to_review.xlsx")
dataset_to_review <- readxl::read_excel("inputs/10 - exercise - analysis_to_review.xlsx", sheet = "dataset")
loa_for_review <- analysis_to_review %>% 
  create_loa_from_results()

review_design <- srvyr::as_survey_design(dataset_to_review, strata = "admin1")
my_analysis_exercise <- create_analysis(review_design, loa = loa_for_review, sm_separator = "/")
Joining with `by = join_by(admin1, respondent_gender)`
Joining with `by = join_by(admin1, ind_gender)`
Joining with `by = join_by(admin1, caregiver_available)`
■■■■■■ 18% | ETA: 5s
Joining with `by = join_by(admin1, difficulty_self_care)`
Joining with `by = join_by(admin1, edu_modality_v2)`
Joining with `by = join_by(admin1, liv_emerg_csi_3)`
Joining with `by = join_by(admin1, fs_hhs_nofood_yn)`
■■■■■■■■■■■■■■■■■■■■■■■■ 76% | ETA: 1s
Joining with `by = join_by(admin1, wash_handwashingfacility_observed_water)`
Joining with `by = join_by(admin1, wash_handwashingfacility_observed_soap)`
Joining with `by = join_by(admin1, hoh_age)`
my_results_table_shorter <- my_analysis_exercise$results_table %>% 
  select(analysis_key, stat)

binded_results_table <- analysis_to_review %>% 
  full_join(my_results_table_shorter, by = "analysis_key")

exercise_review <- review_analysis(binded_results_table,
                                   stat_columns_to_review = "stat.x",
                                   stat_columns_to_compare_with = "stat.y", 
                                   analysis_key_column = "analysis_key")

exercise_review$review_table %>% 
  group_by(review_check,review_comment) %>% 
  tally()
review_check review_comment n
TRUE Same results 134

How you would review an analysis that does not have an analysis key? (discussion)

  • If the analysis is in long format, add the analysis key.
  • If the analysis is in a wide format, change to long format then add the analysis key.

Outputs

library(presentresults)

my_results_table <- my_results$results_table

The framework is built around 4 steps: cleaning, composition, analysis, outputs

  • Cleaning: Any manipulation to go from the raw data to the clean data
  • Composition: Any manipulation before the analysis e.g. adding indicators, adding information from loop, main dataset, or any other dataset (e.g. previous round), aok aggregation, etc.
  • Analysis: Any manipulation regarding only the analysis.
  • Outputs: Any manipulation to format the outputs. Outputs are created from the results table, from the stat + analysis key

The following section will present some introduction about the outputs.

There are currently two types of table:

  • one that have the variables in the rows and the disagregation in the columns,
  • one that have the disagregation in the rows and the variables in the columns.

There are two steps to turn a results table:

  • Turn the long results table to a large results table.
  • Format and export it to Excel.
create_*

create_* functions will create, transform something, e.g. creating a cleaning log with the checks to be filled, create analysis results table, create an output.

Outputs from create_* functions outputs can be in different shape, format, etc.

create_* function is catch-all.

create_*_group_x_variable

my_results_table %>% 
  create_table_group_x_variable() %>% 
  create_xlsx_group_x_variable(file_path = "outputs/04 - example - group_x_variable.xlsx", overwrite = T)

create_*_variable_x_group

my_results_table %>% 
  create_table_variable_x_group() %>%
  create_xlsx_variable_x_group(file_path = "outputs/05 - example - variable_x_group.xlsx", overwrite = T)

Tabular HTML

The folders 05 - reach_tabular_html_* are example of Quarto projects. They can be used to produce some tables in the html format to be able to be shared quickly.

The params in the header can be change. Use Render after to create the html output.

Note

The following is work in progress. It will later become something like create_html_variable_x_group.

Exercise

Exercise 1

  • Create an excel table with the strata in the rows and the variables in the columns.
library(presentresults)
exercise_outputs <- readxl::read_excel("inputs/10 - exercise - analysis_to_review.xlsx")
exercise_outputs %>% 
  create_table_group_x_variable() %>% 
  create_xlsx_group_x_variable(file_path = "outputs/06 - correction - group_x_variable_table.xlsx", overwrite = T) 

Exercise 2

  • Try the tabular html output.
  • Try to edit the authors, RCID and the introduction.